109 research outputs found

    Privileged and Unprivileged Groups: An Empirical Study on the Impact of the Age Attribute on Fairness

    Get PDF

    On the Use of Evaluation Measures for Defect Prediction Studies

    Get PDF
    Software defect prediction research has adopted various evaluation measures to assess the performance of prediction models. In this paper, we further stress on the importance of the choice of appropriate measures in order to correctly assess strengths and weaknesses of a given defect prediction model, especially given that most of the defect prediction tasks suffer from data imbalance. Investigating 111 previous studies published between 2010 and 2020, we found out that over a half either use only one evaluation measure, which alone cannot express all the characteristics of model performance in presence of imbalanced data, or a set of binary measures which are prone to be biased when used to assess models especially when trained with imbalanced data. We also unveil the magnitude of the impact of assessing popular defect prediction models with several evaluation measures based, for the first time, on both statistical significance test and effect size analyses. Our results reveal that the evaluation measures produce a different ranking of the classification models in 82% and 85% of the cases studied according to the Wilcoxon statistical significance test and Ă‚12 effect size, respectively. Further, we observe a very high rank disruption (between 64% to 92% on average) for each of the measures investigated. This signifies that, in the majority of the cases, a prediction technique that would be believed to be better than others when using a given evaluation measure becomes worse when using a different one. We conclude by providing some recommendations for the selection of appropriate evaluation measures based on factors which are specific to the problem at hand such as the class distribution of the training data, the way in which the model has been built and will be used. Moreover, we recommend to include in the set of evaluation measures, at least one able to capture the full picture of the confusion matrix, such as MCC. This will enable researchers to assess whether proposals made in previous work can be applied for purposes different than the ones they were originally intended for. Besides, we recommend to report, whenever possible, the raw con- fusion matrix to allow other researchers to compute any measure of interest thereby making it feasible to draw meaningful observations across different studies

    On the Relationship Between Story Point and Development Effort in Agile Open-Source Software

    Get PDF
    Background: Previous work has provided some initial evidence that Story Point (SP) estimated by human-experts may not accurately reflect the effort needed to realise Agile software projects. / Aims: In this paper, we aim to shed further light on the relationship between SP and Agile software development effort to understand the extent to which human-estimated SP is a good indicator of user story development effort expressed in terms of time needed to realise it. / Method: To this end, we carry out a thorough empirical study involving a total of 37,440 unique user stories from 37 different open-source projects publicly available in the TAWOS dataset. For these user stories, we investigate the correlation between the issue development time (or its approximation when the actual time is not available) and the SP estimated by human-expert by using three widely-used correlation statistics (i.e., Pearson, Kendall and Spearman). Furthermore, we investigate SP estimations made by the human-experts in order to assess the extent to which they are consistent in their estimations throughout the project, i.e., we assess whether the development time of the issues is proportionate to the SP assigned to them. / Results: The average results across the three correlation measures reveal that the correlation between the human-expert estimated SP and the approximated development time is strong for only 7% of the projects investigated, and medium (58%) or low (35%) for the remaining ones. Similar results are obtained when the actual development time is considered. Our empirical study also reveals that the estimation made is often not consistent throughout the project and the human estimator tends to misestimate in 78% of the cases. / Conclusions: Our empirical results suggest that SP might not be an accurate indicator of open-source Agile software development effort expressed in terms of development time. The impact of its use as an indicator of effort should be explored in future work, for example as a cost-driver in automated effort estimation models or as the prediction target

    Search-based approaches for software development effort estimation

    Get PDF
    2011 - 2012Effort estimation is a critical activity for planning and monitoring software project development and for delivering the product on time and within budget. Significant over or under-estimates expose a software project to several risks. As a matter of fact under-estimates could lead to addition of manpower to a late software project, making the project later (Brooks’s Law), or to the cancellation of activities, such as documentation and testing, negatively impacting on software quality and maintainability. Thus, the competitiveness of a software company heavily depends on the ability of its project managers to accurately predict in advance the effort required to develop software system. However, several challenges exists in making accurate estimates, e.g., the estimation is needed early in the software lifecycle, when few information about the project are available, or several factors can impact on project effort and these factor are usually specific for different production contexts. Several techniques have been proposed in the literature to support project manager in estimating software project development effort. In the last years the use of Search-Based (SB) approaches has been suggested to be employed as an effort estimation technique. These approaches include a variety of meta-heuristics, such as local search techniques (e.g., Hill Climbing, Tabu Search, Simulated Annealing) or Evolutionary Algorithms (e.g., Genetic Algorithms, Genetic Programming). The idea underlying the use of such techniques is based on the reformulation of software engineering problems as search or optimization problems whose goal is to find the most appropriate solutions which conform to some adequacy criteria (i.e., problem goals). In particular, the use of SB approaches in the context of effort estimation is twofold: they can be exploited to build effort estimation models or to enhance the use of existing effort estimation techniques. The usage reported in the literature of SB approaches for effort estimation have provided promising results that encourage further investigations. However, they can be considered preliminary studies. As a matter of fact, the capabilities of these approaches were not fully exploited, either the employed empirical analyses did not consider the more recent recommendations on how to carry out this kind of empirical assessment in the effort estimation and in the SBSE contexts. The main aim of the PhD dissertation is to provide an insight on the use of SB techniques for the effort estimation trying to highlight strengths and weaknesses of these approaches for both the uses above mentioned. [edited by Author]XI n.s

    Py2Cy: A Genetic Improvement Tool To Speed Up Python

    Get PDF
    Due to its ease of use and wide range of custom libraries, Python has quickly gained popularity and is used by a wide range of developers all over the world. While Python allows for fast writing of source code, the resulting programs are slow to execute when compared to programs written in other programming languages like C. One of the reasons for its slow execution time is the dynamic typing of variables. Cython is an extension to Python, which can achieve execution speed-ups by compiler optimization. One possibility for improvements is the use of static typing, which can be added to Python scripts by developers. To alleviate the need for manual effort, we create Py2Cy, a Genetic Improvement tool for automatically converting Python scripts to statically typed Cython scripts. To show the feasibility of improving runtime with Py2Cy, we optimize a Python script for generating Fibonacci numbers. The results show that Py2Cy is able to speed up the execution time by up to a factor of 18

    MEG: Multi-objective Ensemble Generation for Software Defect Prediction

    Get PDF
    Background: Defect Prediction research aims at assisting software engineers in the early identification of software defect during the development process. A variety of automated approaches, ranging from traditional classification models to more sophisticated learning approaches, have been explored to this end. Among these, recent studies have proposed the use of ensemble prediction models (i.e., aggregation of multiple base classifiers) to build more robust defect prediction models. / Aims: In this paper, we introduce a novel approach based on multi-objective evolutionary search to automatically generate defect prediction ensembles. Our proposal is not only novel with respect to the more general area of evolutionary generation of ensembles, but it also advances the state-of-the-art in the use of ensemble in defect prediction. / Method: We assess the effectiveness of our approach, dubbed as Multi-objective Ensemble Generation (MEG), by empirically benchmarking it with respect to the most related proposals we found in the literature on defect prediction ensembles and on multi-objective evolutionary ensembles (which, to the best of our knowledge, had never been previously applied to tackle defect prediction). / Result: Our results show that MEG is able to generate ensembles which produce similar or more accurate predictions than those achieved by all the other approaches considered in 73% of the cases (with favourable large effect sizes in 80% of them). / Conclusions: MEG is not only able to generate ensembles that yield more accurate defect predictions with respect to the benchmarks considered, but it also does it automatically, thus relieving the engineers from the burden of manual design and experimentation

    Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Replication Study

    Get PDF
    In the last decade, several studies have explored automated techniques to estimate the effort of agile software development. We perform a close replication and extension of a seminal work proposing the use of Deep Learning for Agile Effort Estimation (namely Deep-SE), which has set the state-of-the-art since. Specifically, we replicate three of the original research questions aiming at investigating the effectiveness of Deep-SE for both within-project and cross-project effort estimation. We benchmark Deep-SE against three baselines (i.e., Random, Mean and Median effort estimators) and a previously proposed method to estimate agile software project development effort (dubbed TF/IDF-SVM), as done in the original study. To this end, we use the data from the original study and an additional dataset of 31,960 issues mined from TAWOS, as using more data allows us to strengthen the confidence in the results, and to further mitigate external validity threats. The results of our replication show that Deep-SE outperforms the Median baseline estimator and TF/IDF-SVM in only very few cases with statistical significance (8/42 and 9/32 cases, respectively), thus confounding previous findings on the efficacy of Deep-SE. The two additional RQs revealed that neither augmenting the training set nor pre-training Deep-SE play lead to an improvement of its accuracy and convergence speed. These results suggest that using semantic similarity is not enough to differentiate user stories with respect to their story points; thus, future work has yet to explore and find new techniques and features that obtain accurate agile software development estimates

    An Empirical Study on the Fairness of Pre-trained Word Embeddings

    Get PDF
    Pre-trained word embedding models are easily distributed and applied, as they alleviate users from the effort to train models themselves. With widely distributed models, it is important to ensure that they do not exhibit undesired behaviour, such as biases against population groups. For this purpose, we carry out an empirical study on evaluating the bias of 15 publicly available, pre-trained word embeddings model based on three training algorithms (GloVe, word2vec, and fastText) with regard to four bias metrics (WEAT, SEMBIAS, DIRECT BIAS, and ECT). The choice of word embedding models and bias metrics is motivated by a literature survey over 37 publications which quantified bias on pre-trained word embeddings. Our results indicate that fastText is the least biased model (in 8 out of 12 cases) and small vector lengths lead to a higher bias

    Investigating the Effectiveness of Clustering for Story Point Estimation

    Get PDF
    Automated techniques to estimate Story Points (SP) for user stories in agile software development came to the fore a decade ago. Yet, the state-of-the-art estimation techniques’ accuracy has room for improvement. In this paper, we present a new approach for SP estimation, based on analysing textual features of software issues by employing latent Dirichlet allocation (LDA) and clustering. We first use LDA to represent issue reports in a new space of generated topics. We then use hierarchical clustering to agglomerate issues into clusters based on their topic similarities. Next, we build estimation models using the issues in each cluster. Then, we find the closest cluster to the new coming issue and use the model from that cluster to estimate the SP. Our approach is evaluated on a dataset of 26 open source projects with a total of 31,960 issues and compared against both baselines and state-of-the-art SP estimation techniques. The results show that the estimation performance of our proposed approach is as good as the state-of-the-art. However, none of these approaches is statistically significantly better than more naive estimators in all cases, which does not justify their additional complexity. We therefore encourage future work to develop alternative strategies for story points estimation. The experimental data and scripts we used in this work are publicly available to allow for replication and extension

    How Do Android Developers Improve Non-Functional Properties of Software?

    Get PDF
    Nowadays there is an increased pressure on mobile app developers to take non-functional properties into account. An app that is too slow or uses much bandwidth will decrease user satisfaction, and thus can lead to users simply abandoning the app. Although automated software improvement techniques exist for traditional software, these are not as prevalent in the mobile domain. Moreover, it is yet unknown if the same software changes would be as effective. With that in mind, we mined overall 100 Android repositories to find out how developers improve execution time, memory consumption, bandwidth usage and frame rate of mobile apps. We categorised non-functional property (NFP) improving commits related to performance to see how existing automated software improvement techniques can be improved. Our results show that although NFP improving commits related to performance are rare, such improvements appear throughout the development lifecycle. We found altogether 560 NFP commits out of a total of 74,408 commits analysed. Memory consumption is sacrificed most often when improving execution time or bandwidth usage, although similar types of changes can improve multiple non-functional properties at once. Code deletion is the most frequently utilised strategy except for frame rate, where increase in concurrency is the dominant strategy. We find that automated software improvement techniques for mobile domain can benefit from addition of SQL query improvement, caching and asset manipulation. Moreover, we provide a classifier which can drastically reduce manual effort to analyse NFP improving commits
    • …
    corecore